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Abstract. We present a formalization of modern SAT solvers and their properties in a 
form of abstract state transition systems. SAT solving procedures are described as transi- 
tion relations over states that represent the values of the solver's global variables. Several 
different SAT solvers are formalized, including both the classical DPLL procedure and its 
state-of-the-art successors. The formalization is made within the Isabelle/HOL system and 
the total correctness (soundness, termination, completeness) is shown for each presented 
system (with respect to a simple notion of satisfiability that can be manually checked). 
The systems are defined in a general way and cover procedures used in a wide range of 
modern SAT solvers. Our formalization builds up on the previous work on state transition 
systems for SAT, but it gives machine-verifiable proofs, somewhat more general specifica- 
tions, and weaker assumptions that ensure the key correctness properties. The presented 
proofs of formal correctness of the transition systems can be used as a key building block 
in proving correctness of SAT solvers by using other verification approaches. 



1. Introduction 

The problem of checking propositional satisfiabihty (SAT) is one of the central problems 
in computer science. It is the problem of deciding if there is a valuation of variables under 
which a given propositional formula (in conjunctive normal form) is true. SAT was the first 
problem that was proved to be NP-complete |Coo71j and it still holds a central position in 
the field of computational complexity. SAT solvers, procedures that solve the SAT problem, 
are successfully used in many practical applications such as electronic design automation, 
software and hardware verification, artificial intelligence, and operations research. 

Most state-of-the-art complete SAT solvers are essentially based on a branch and back- 
track procedure called Davis-Putnam-Logemann-Loveland or the DPLL procedure |DP601 
IDLL62| . Modern SAT solvers usually also employ (i) several conceptual, high-level algo- 
rithmic additions to the original DPLL procedure, (ii) smart heuristic components, and (iii) 
better low-level implementation techniques. Thanks to these, spectacular improvements in 
the performance of SAT solvers have been achieved and nowadays SAT solvers can decide 
satisfiability of CNF formulae with tens of thousands of variables and millions of clauses. 

1998 ACM Subject Classification: F.3.1, F.4.1. 
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The tremendous advance in the SAT solving technology has not been accompanied 
with corresponding theoretical results about the solver correctness. Descriptions of new 
procedures and techniques are usually given in terms of implementations, while correctness 
arguments are either not given or are given only in outlines. This gap between practical 
and theoretical progress needs to be reduced and first steps in that direction have been 
made only recently, leading to the ultimate goal of having modern SAT solvers that are 
formally proved correct. That goal is vital since SAT solvers are used in applications that 
are very sensitive (e.g., software and hardware verification) and their misbehaviour could 
be both financially expensive and dangerous from the aspect of security. Ensuring trusted 
SAT solving can be achieved by two approaches. 

One approach for achieving a higher level of confidence in SAT solvers' results, suc- 
cessfully used in recent years, is proof-checking |ZM03l IGN03[ IGel07[ IWA091 IDFMSlOj . 
In this approach, solvers are modified so that they output not only sat or unsat answers, 
but also justification for their claims (models for satisfiable instances and proof objects 
for unsatisfiable instances) that are then checked by independent proof-checkers. Proof- 
checking is relatively easy to implement, but it has some drawbacks. First, justification for 
every solved SAT instance has to be verified separately. Also, generating unsatisfiability 
proofs introduces some overhead to the solver's running time, proofs are typically large and 
may consume gigabytes of storage space, and proof-checking itself can be time consuming 
[GelOTj . Since proof-checkers have to be trusted, they must be very simple programs so they 
can be "verified" by code inspectionQ On the other hand, in order to be efficient, they must 
use specialized functionality of the underlying operating system which reduces the level of 
their reliability (e.g., the proof checker used in the SAT competitions uses Linux's mmap 
functionality |Gel07j ). 

The other approach for having trusted solvers' results is to verify the SAT solver itself, 
instead of checking each of its claims. This approach is very demanding, since it requires 
formal analysis of the complete solver's behaviour. In addition, whenever the implementa- 
tion of the solver changes, the correctness proofs must be adapted to refiect the changes. 
Still, in practice, the core solving procedure is usually stable and stays fixed, while only 
heuristic components frequently change. The most challenging task is usually proving the 
correctness of the core solving procedures, while heuristic components only need to satisfy 
relatively simple properties that are easily checked. This approach gives also the following 
benefits: 

• Although the overheads of generating unsatisfiability proofs during solving are not un- 
manageable, in many applications they can be avoided if the solver itself is trusted0 

• Verification of modern SAT solvers could help in better theoretical understanding of how 
and why they work. A rigorous analysis and verification of modern SAT solvers may 
reveal some possible improvements in underlying algorithms and techniques which can 
influence and improve other solvers as well. 

• Verified SAT solvers can serve as trusted kernel checkers for verifying results of other 
untrusted verifiers such as BDDs, model checkers, and SMT solvers. Also, verification of 
some SAT solver modules (e.g.. Boolean constraint propagation) can serve as a basis for 
creating both verified and efficient proof-checkers for SAT. 

^Alternatively, proof-checkers could be formally verified by a proof assistant, and then their correctness 
would rely on the correctness of the proof assistant. 

^In some applications, proofs of unsatisfiability are still necessary as they are used, for example, for 
extracting unsatisfiable cores and interpolants. 
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Figure 1: Different approaclies for SAT solver verification 



In order to prove correctness of a SAT solver, it has to be formalized in some meta-theory 
so its properties can be analyzed in a rigorous mathematical manner. In order to achieve 
the desired highest level of trust, formalization in a classical "pen-and-paper" fashion is not 
satisfactory and, instead, a mechanized and machine-checkable formalization is preferred. 
The formal specification of a SAT solver can be made in several ways (illustrated in Figure 
[H each with an appropriate verification paradigm and each having its own advantages and 
disadvantages, described in the following text). 

Verification of abstract state transition system: State transition systems are an 
abstract and purely mathematical way of specifying program behaviour. Using this ap- 
proach, the SAT solver's behaviour is modelled by transitions between states that repre- 
sent the values of the solver's global variables. Transitions can be made only by following 
precisely defined transition rules. Proving correctness of state transition systems can be 
performed by the standard mathematical apparatus. There are state transition systems 
describing the top-level architecture of the modern DPLL-based SAT solvers (and related 
SMT solvers) |KG071 INQTOGj and their correctness has been informally shown. 

The main advantage of the abstract state transition systems is that they are mathe- 
matical objects, so it is relatively easy to make their formalization within higher-order 
logic and to formally reason about them. Also, their verification can be a key building 
block for other verification approaches. Disadvantages are that the transition systems 
do not specify many details present in modern solver implementations and that they are 
not directly executable. 

Verified implementation witiiin a proof assistant: A program's behaviour can be 
specified within the higher-order logic of a proof assistant (regarded as a purely functional 
programming language). This approach is often called shallow embedding into HOL. 
Specifications may vary from very abstract ones to detailed ones covering most details 
present in the real SAT solver's code. The level of details can incrementally be increased 
(e.g., by using a datatype refinement). Having the specification inside the logic, its 
correctness can be proved again by using the standard mathematical apparatus (mainly 
induction and equational reasoning). Based on the specification, executable functional 
programs can be generated by means of code extraction — the term language of the logic 
within the proof assistant is identified with the term language of the target language and 
the verified program correctness is transferred to the exported program, up to simple 
transformation rules. 
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Advantages of using the shallow embedding are that, once the solver is defined within 
the proof assistant, it is possible to verify it directly inside the logic and a formal model 
of the operational or denotational semantics of the language is not required. Also, ex- 
tracted executable code can be trusted with a very high level of confidence. On the other 
hand, the approach requires building a fresh implementation of a SAT solver within the 
logic. Also, since higher-order logic is a pure functional language, it is unadapted to 
modelling imperative data-structures and their destructive updates. Special techniques 
must be used to have mutable data-structures and, consequently, an efficient generated 
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Verification of the real implementations: The most demanding approach for verifying 
a SAT solver is to directly verify the full real-world solver code. Since SAT solvers are 
usually implemented in imperative programming languages, verifying the correctness of 
implementation can be made by using the framework of Hoare logic [Hoa69j — a formal 
system for reasoning about programs written in imperative programming languages. The 
program behaviour can then be described in terms of preconditions and postconditions 
for pieces of code. Proving the program correctness is made by formulating and prov- 
ing verification conditions. For instance, Isabelle/HOL provides a formal verification 
environment for sequential imperative programs ( [Sch06j ). 

The main benefit of using the Hoare style verification is that it enables reasoning 
about the imperative code, which is the way that most real-world SAT solvers are im- 
plemented. However, since real code is overwhelmingly complex, simpler approximations 
are often made and given in pseudo-programming languages. This can significantly sim- 
plify the implementation, but leaves a gap between the correctness proof and the real 
implementation. 

In this paper we focus on the first verification approach as it is often suitable to separate 
the verification of the abstract algorithms and that of their specific implementations In 
addition, state transition systems, as the most abstract specifications, cover the widest 
range of existing SAT solver implementations. Moreover, the reasoning used in verifying 
abstract state transition systems for SAT can serve as a key building block in verification 
of more detailed descriptions of SAT solvers using the other two approaches described 
above (as illustrated by Figure [I]). Indeed, within our SAT verification project |MJ09j . we 
have already applied these two approaches [Mar 09 1 IMarlOl IMJlOj . and in both cases the 
correctness arguments were mainly reduced to correctness of the corresponding abstract 
state transition systems. These transition systems and their correctness proofs are presented 
in this paper for the first time, after they evolved to some extent through application within 
the other two verification approaches. 

The methodology that we use in this paper for the formalization of SAT solvers via 
transition systems is incremental refinement: the formalization begins with a most basic 
specification, which is then refined by introducing more advanced techniques, while pre- 
serving the correctness. This incremental approach proves to be a very natural approach 
in formalizing complex software systems. It simplifies understanding of the system and 
reduces the overall verification effort. Each of the following sections describes a separate 
abstract state transition system. Although, formally viewed, all these systems are indepen- 
dent, each new system extends the previous one and there are tight connections between 



recent example is the L4 verified OS Icernel, wliere a siiallowly embedded Haskell specification of the 
kernel is verified, and then the C code is shown to implement the Haskell specification, yielding a natural 
separation of concepts and issues [KlelO] . 
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them. Therefore, we do not expose each new system from scratch, but only give additions 
to the previous one. We end up with a system that rather precisely describes modern SAT 
solvers, including advanced techniques such as backjumping, learning, conflict analysis, for- 
getting and restarting. The systems presented are related to existing solvers, their abstract 
descriptions and informal correctness proofs. 

The paper is accompanied by a full formalization developed within the Isabelle/HOL 
proof assistant!! The full version of the papeiH contains an appendix with informal proofs 
of all lemmas used. All definitions, lemmas, theorems and proofs of top-level statements 
given in the paper correspond to their Isabelle counterparts, and here are given in a form 
accessible not only to Isabelle users, but to a wider audience. 

The main challenge in each large formalization task is to define basic relevant notions in 
appropriate terms, build a relevant theory and a suitable hierarchy of lemmas that facilitates 
constructing top-level proofs. Although in this paper we do not discuss all decisions made 
in the above directions, the final presented material is supposed to give the main motivating 
ideas and, implicitly, to illustrate a proof management technology that was used. The main 
purpose of the paper is to give a clear picture of central ideas relevant for verification of 
SAT transition systems, hopefully interesting both to SAT developers and to those involved 
in formalization of mathematics. 

The main contributions of this paper are the following. 

• SAT solving process is introduced by a hierarchical series of abstract transition systems, 
ending up with the state-of-the-art system. 

• Formalization and mechanical verification of properties of the abstract transition systems 
for SAT are performed (within this, invariants and well-founded relations relevant for 
termination are clearly given; conditions for soundness, completeness, and termination 
are clearly separated). Taking advantage of this formalization, different real- world SAT 
solvers can be verified, using different verification approaches. 

• First proofs (either informal or formal) of some properties of modern SAT solvers (e.g., 
termination condition for frequent restarting) are given, providing deeper understanding 
of the solving process. 

The rest of the paper is organized as follows: In Section [2] some background on SAT solving, 
abstract state transition systems, and especially abstract state transition systems for SAT is 
given. In Section [3] basic definitions and examples of propositional logic and CNF formulae 
are given. In Section IH a system corresponding to basic DPLL search is formalized. In 
Section \5\ that system is modified and backtracking is replaced by more advanced back- 
jumping. In Section m the system is extended by clause learning and forgetting. In Section 
[7] and Section [8] a system with conflict analysis and a system with restarting and forgetting 
are formalized. In Section [9] we discuss related work and our contributions. In Section [T0| 
final conclusions are drawn. 

2. Background 

In this section we give a brief, informal overview of the SAT solving process, abstract 
state transition systems and abstract state transition systems for SAT. The paper does 

^The whole presented formalization is available from AFP [Mar08] and, the latest version, from 
|http : / /argo . matf . bg . ac . r s 

^The full version of the paper is available from |http : / /argo . matf . bg . ac . rsj 
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not intend to be a tutorial on modern DPLL-based SAT solving techniques — the rest 
of the paper contains only some brief explanations and assumes the relevant background 
knowledge (more details and tutorials on modern SAT solving technology can be found in 
other sources e.g., [BHMWOQi lMar09j ). 

2.1. SAT Solving. SAT solvers are decision procedures for the satisfiability problem for 
prepositional formulae in conjunctive normal form (CNF). State-of-the-art SAT solvers are 
mainly based on a branch-and-backtrack procedure called DPLL (Davis-Putnam-Logemann- 
Loveland) jPPGOl [DLL62j and its modern successors. The original DPLL procedure (shown 
in Figure E]) combines backtrack search with some basic, but efficient inference rules. 

function dpil (F : Formula) : (SAT, UNSAT) 
begin 

if F is empty then return SAT 

else if there is an empty clause in F then return UNSAT 
else if there is a pure literal / in F then return dpll(-F[i — > T]) 
else if there is a unit clause [I] in F then return dpll(-F[/ — 5- T]) 
else begin 

select a literal I occurring in F 

if dpll(F[; T]) = SAT then return SAT 

else return dpIl ^ _L]) 

end 

end 

Figure 2: The original DPLL procedure 

The search component selects a branching literal I occurring in the formula F, and tries to 
satisfy the formula obtained by replacing / with T and simplifying afterwards. If the sim- 
plified formula is satisfiable, so is the original formula F. Otherwise, the formula obtained 
from F by replacing / with _L and by simplifying afterwards is checked for satisfiability and 
it is satisfiable if and only if the original formula F is satisfiable. This process stops if the 
formula contains no clauses or if it contains an empty clause. A very important aspect of 
the search process is the strategy for selecting literals for branching — while not important 
for the correctness of the procedure, this strategy can have a crucial impact on efficiency. 

The simple search procedure is enhanced with several simple inference mechanisms. The 
unit clause rule is based on the fact that if there is a clause with a single literal present in 
F, its literal must be true in order to satisfy the formula (so there is no need for branching 
on that literal). The pure literal rule is based on the fact that if a literal occurs in the 
formula, but its opposite literal does not, if the formula is satisfiable, in one of its models 
that literal is true. These two rules are not necessary for completeness, although they have 
a significant impact on efficiency. 

Passing valuations instead of modifying the formula. In the original DPLL procedure, 
the formula considered is passed as a function argument, and modified throughout recursive 
calls. This is unacceptably inefficient for huge propositional formulae and can be replaced 
by a procedure that maintains a current (partial) valuation M and, rather than modifying 
the formula, keeps the formula constant and checks its value against the current valuation 
(see Figure [3]) . The inference rules used in the original procedure must be adapted to fit 
this variant of the algorithm. The unit clause rule then states that if there is a clause in 
F such that all its literals, except exactly one, are false in M, and that literal is undefined 
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in M, then this literal must be added to M in order to satisfy this clause. The pure literal 
rule turns out to be too expensive in this context, so modern solvers typically do not use it. 

function dpil (A/ : Valuation) : (SAT, UNSAT) 
begin 

if M F then return UNSAT 

else if M is total wrt . the variables of F then return SAT 
else if there is a unit clause (i.e., a clause 

l\J hV . . .\J h in F s.t. I'l^M, h, . . . ,h € M) then return dpIKM U {I}) 
else begin 

select a literal I s.t. l€F, iJ^M 

if dpll(A/U{/}) = SAT then return SAT 

else return dpIKM U {7}) 

end 

end 



Figure 3: DPLL procedure with valuation passing 



Non-recursive implementation. To gain efficiency, modern SAT solvers implement 
DPLL-like procedures in a non-recursive fashion. Instead of passing arguments through 
recursive calls, both the current formula F and the current partial valuation M are kept as 
global objects. The valuation acts as a stack and is called assertion trail. Since the trail 
represents a valuation, it must not contain repeated nor opposite literals (i.e., it is always 
distinct and consistent). Literals are added to the stack top (asserting) or removed from the 
stack top (backtracking). The search begins with an empty trail. During the solving process, 
the solver selects literals undefined in the current trail M and asserts them, marking them 
as decision literals. Decision literals partition the trail into levels, and the level of a literal 
is the number of decision literals that precede that literal in the trail. After each decision, 
unit propagation is exhaustively applied and unit literals are asserted to M, but as implied 
literals (since they are not arbitrary decisions) . This process repeats until either (i) a clause 
in F is found which is false in the current trail M (this clause is called a conflict clause) 
or (ii) all the literals occurring in F are defined in M and no conffict clause is found in 
F. In the case (i), a conflict reparation (backtracking) procedure must be applied. In the 
basic variant of the conflict reparation procedure, the last decision literal / and all literals 
after it are backtracked from M, and the opposite literal of / is asserted, also as an implied 
literal. If there is no decision literal in M when a conflict is detected, then the formula F 
is unsatisfiable. In the case (ii), the formula is found to be satisfiable and M is its model. 

Modern DPLL enhancements. For almost half of a century, DPLL-based SAT proce- 
dures have undergone various modifications and improvements. Accounts of the evolution 
of SAT solvers can be found in recent literature |BHMW09t IGKSS07] . Early SAT solvers 
based on DPLL include Tableau (NTAB), POSIT, 2cl and CSAT, among others. In the 
mid 1990's , a new generat ion of solvers such as GRASP jMSS99] . SATO [Zh^ . Chaff 
[MMZ"'"0T , and BerkMin |GN02] appeared, and in these solvers a lot of attention was 
payed to optimisation of various aspects of the DPLL algorithm. Some influential modern 
SAT solvers include MiniSat [ES04] and PicoSAT j BieOS I. 

A significant improvement over the basic search algorithm is to replace the simple 
conflict reparation based on backtracking by a more advanced one based on conflict driven 
backjumping, first proposed in the Constraint Satisfaction Problem (CSP) domain |BHZ06] . 
Once a conflict is detected, a conflict analysis procedure finds sequence of decisions (often 
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buried deeper in the trail) that eventuahy led to the current conflict. Conflict analysis can 
be described in terms of graphs and the backjump clauses are constructed by traversing 
a graph called implication graph |MSS99| . The process can also be described in terms 
of resolution that starts from the conflict clause and continues with clauses that caused 
unit propagation of literals in that clause |ZM02] . There are several strategies for conflict 
analysis, leading to different backjump clauses |BHMW09] . Most conflict analysis strategies 
are based on the following scheme: 

(1) Conflict analysis starts with a conflict clause (i.e., the clause from F detected to be 
false in M). The conflict analysis clause C is set to the conflict clause. 

(2) Each literal from the current conflict analysis clause C is false in the current trail M 
and is either a decision literal or a result of a propagation. For each propagated literal 
I it is possible to find a clause (reason clause) that caused I to be propagated. The 
propagated literals from C are then replaced (it will be said explained) by remaining 
literals from their reason clauses. The process of conflict analysis then continues. 

The described procedure continues until some termination condition is met, and the back- 
jump clause is then constructed. Thanks to conflict driven backjumping, a lot of unnecessary 
work can be saved compared to the simple backtrack operation. Indeed, the simple back- 
tracking would have to consider all combinations of values for all decision literals between 
the backjump point and the last decision, while they are all irrelevant for the particular 
conflict. 

The result of conflict analysis is usually a clause that is a logical consequence of F and 
that explains a particular conflict that occurred. If this clause was added to F, then this 
type of conflict would occur never again during search (even in some other contexts, i.e., in 
some other parts of the search space) . This is why solvers usually perform clause learning 
and append (redundant) deduced clauses to F. However, if the formula F becomes too 
large, some clauses have to be forgotten. Conflict driven backjumping with clause learning 
were first incorporated into a SAT solver in the mid 1990's by Silva and Sakallah in GRASP 
|MSS99) and by Bayardo and Schrag in reLsat |BS97j . DPLL-based SAT solvers employing 
conflict driven clause learning are often called CDCL solvers. 

Another significant improvement is to empty the trail and restart the search from time 
to time, in a hope that it would restart in an easier part of the search space. Randomized 
restarts were introduced by Gomes et al. |GSK98| and further developed by Baptista and 
Marques-Silva |BMSOO| . 

One of the most demanding operations during solving is the detection of false and unit 
clauses. Whenever a literal is asserted, the solver must check F for their presence. To aid 
this operation, smart data structures with corresponding implementations are used. One 
of the most advanced ones is the two-watched literal scheme, introduced by Moskewicz et 
al. in their solver zChaff [MMZ+01 . 

2.2. Abstract State Transition Systems. An abstract state transition system for an 
imperative program consists of a set of states S describing possible values of the program's 
global variables and a binary transition relation — )• C S" x S*. The transition relation is 
usually the union of smaller transition relations — >j, called the transition rules. If s — >i s' 
holds, we say that the rule i has been applied to the state s and the state s' has been 
obtained. Transition rules are denoted as: 
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, cond^ . . . Condi 
Kulename : 



effect 

Above the line are the conditions condi, . . . , condk that the state s must meet in order 
for the rule to be applicable and the effect denotes the effect that must be applied to the 
components of s in order to obtain s' . 

More formally, transition rules can be defined as relations over states: 

Rulename s s' iff </) 

where (j) denotes a formula that describes conditions on s that have to be met and the 
relationship between s and s' . 

Some states are distinguished as initial states. An initial state usually depends on the 
program input. A state is a final state if no transition rules can be applied. Some states 
(not necessarily final) are distinguished as the outcome states carrying certain resulting 
information. If a program terminates in a final outcome state, it emits a result determined 
by this state. For a decision procedure (such as a SAT solver), there are only two possible 
outcomes: yes (sat) or no {unsat). A state transition system is considered to be correct if 
it has the following properties: 

Termination: from each initial state sq, the execution eventually reaches a final state (i.e., 

there are no infinite chains sq ^ si — >■ . . .). 
Soundness: the program always gives correct answers, i.e., if the program, starting with 

an input / from an initial state sq, reaches a final outcome state with a result O, then O 

is the desired result for the input /. 
Completeness: the program always gives an answer if it terminates, i.e., all final states 

are outcome states. 



2.3. Abstract State Transition Systems for SAT. Two transition rule systems that 
model DPLL-based SAT solvers and related SMT solvers have been published recently. 
Both systems present a basis of the formalization described in this paper. The system of 
Krstic and Goel |KG07] gives a more detailed description of some parts of the solving process 
(particularly the conflict analysis phase) than the one given by Nieuwenhuis, Oliver as and 
Tinelli |NOT06j . so we present its rules in Figured! In this system, along with the formula 
F and the trail M, the state of the solver is characterized by the conflict analysis set C 
that is either a set of literals (i.e., a clause) or the distinguished symbol no-cflct. Input to 
the system is an arbitrary set of clauses Fq. The solving starts from a initial state in which 
F = Fq, M = [], and C = no_cflct. 

The Decide rule selects a literal from a set of decision literals L and asserts it to the 
trail as a decision literal. The set L is typically just the set of all literals occurring in the 
input formulae. However, in some cases a smaller set can be used (based on some specific 
knowledge about the encoding of the input formula). Also, there are cases when this set is 
in fact larger than the set of all variables occurring in the input formulaH 

The UnitPropag rule asserts a unit literal / to the trail M as an implied literal. This 
reduces the search space since only one valuation for / is considered. 

^For example, the standard DIMACS format for SAT requires specifying the number of variables and the 
clauses that make the formula, without guarantees that every variable eventually occurs in the formula. 
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I eL 1,1 
M M l'^ 

IW hW . ..W Ik e F Ii,...Jfc£M I'lf^M 
M := M r 

C = no.cflct JiW . . .Vlk e F li,...,lk&M 
{li,...,lk} 

leC ^ v7i V . ■ . vlfc £ f li,...,lk<l 

C ■.^C\j{h,...,h}\{l} 
c = {h,...,h} hy...yhiF 
F := F\j{hy ...y'h} 

C = {l,li, . . . ,lk} 7 Vli V . . . V7fc e F leveU > TO > leveUi 
C no.cflct M ~ Afl™! T 

C = no.cflct ce F F\c\=c 
F := F\c 

C = no^cflct 
M A/loi 

Figure 4: Transition system for SAT solving by Krstic and Goel (/« -< Ij denotes that the 
hteral li precedes Ij in M, l"^ denotes a decision literal, /* an implied literal, level / 
denotes the decision level of a literal / in M, and mI™! denotes the prefix of M 
up to the level m). 

The Conflict rule is applied when a conflict clause is detected. It initializes the conflict 
analysis and the reparation procedure, by setting C to the set of literals of the conflict 
clause. This set is further reflned by successive applications of the Explain rule, which 
essentially performs a resolution between the clause C and the clause that is the reason 
of propagation of its literal During the conflict analysis procedure, the clause C can be 
added to F by the Learn rule. However, this is usually done only once — when there is 
exactly one literal in C present at the highest decision level of M. In that case, the Backjump 
rule can be applied. That resolves the conflict by backtracking the trail to a level (usually 
the lowest possible) such that C becomes unit clause with a unit literal I. In addition, unit 
propagation of / is performed. 

The Forget rule eliminates clauses. Namely, because of the learning process, the number 
of clauses in the current formula increases. When it becomes too large, detecting false and 
unit clauses becomes too demanding, so from time to time, it is preferable to delete from F 
some clauses that are redundant. Typically, only learnt clauses are forgotten (as they are 
always redundant). 

3. Underlying Theory 

As a framework of our formalization, higher-order logic is used, in a similar way as in the 
system Isabelle/HOL [NPW02j . Formulae and logical connectives of this logic (A, V, — >, 
i — >) are written in the standard way. Equality is denoted by =. Function applications are 



Decide : 
UnitPropag : 
Conflict : 
Explain : 
Learn : 
Backjump : 
Forget : 
Restart : 
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written in prefix form, as in f xi ... Xn- Existential quantification is denoted by 3 x. ... 
and universal quantification by V a; 

In this section we will introduce definitions necessary for notions of satisfiability and 
notions used in SAT solving. Most of the definitions are simple and technical so we give 
them in a very dense form. They make the paper self-contained and can be used just for 
reference. 

The correctness of the whole formalization effort eventually relics on the definition of 
satisfiable formulae, which is rather straightforward and easily checked by human inspection. 

3.1. Lists, Multisets, and Relations. We assume that the notions of ordered pairs, lists 
and (finite) sets are defined within the theory. Relations and their extensions are used 
primarily in the context of ordering relations and the proofs of termination. We will use 
standard syntax and semantics of these types and their operations. However, to aid our 
formalization, some additional operations are introduced. 

Definition 3.1 (Lists related). 

• The first position of an element e in a list /, denoted firstPos e I, is the zero-based index 
of the first occurrence of e in / if it occurs in I or the length of I otherwise. 

• The prefix to an element e of a list I, denoted by prefixTo e Z, is the list consisting of all 
elements of I preceding the first occurrence of e (including e). 

• The prefix before an element e of a list /, denoted by prefixBefore e Z is the list of all 
elements of I preceding the first occurrence of e (not including e). 

• An element ei precedes 62 in a list I, denoted by ei -<i 62, if both occur in I and the first 
position of ei in I is less than the first position of 62 in I. 

• A list p is a prefix of a list I (denoted by p < /) if there exists a list s such that / = p@s. 

Definition 3.2 (Multiset). A multiset over a type X is a function S mapping X to natural 
numbers. A multiset is finite if the set {x \ S{x) > 0} is finite. The union of multisets S 
and T is a function defined as {S U T){x) = S{x) + T{x). 

Definition 3.3 (Relations related). 

• The composition of two relations pi and p2 is denoted by pi o p2. The n-th degree of 
the relation p is denoted by The transitive closure of p is denoted by p"*", and the 
transitive and refiexive closure of p by p*. 

• A relation is well-founded iff: 

VP. ((Va;. (V?/. xyy ^ P{y)) — ^ P{x)) — ^ (Va;. P{x))) 

• If ;^ is a relation on X, then its lexicographic extension y^*^^ is a relation on lists of X, 
defined by: 

s>-'^^t iff (3 r. s = t@r A r ^ []) V 

{Brs't'ab. s = r@a@s' A t = r@b@t' A ayb) 

• If >- is a relation on X, then its multiset extension is a relation defined over multisets 
over X (denoted by (xi, . . . , x„)). The relation is a transitive closure of the relation 
^muiti^ defined by: 

Si >- 52 iff 3S S'2 si. Si = SU (si) A S2 = S U S'2 A 

V 82- S2 e S'2 5> Si >- S2 
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• Let and yy be relations over X and Y . Their lexicographic product, denoted by 

(*lex*) >-y, is a relation >- oti X xY such that 

(xi,yi) >- {x2,y2) iff xi y^ X2 V {xi ^ X2 A yi 2/2) 

• Let be a relation on X, and for each x S X let be a relation over Y (i.e., let 
A X. ;^^be a function mapping X to relations on y). Their parametrized lexicographic 
producti^] denoted by ^x (*lexP*) is a relation ;^ on X x y such that 

{xi,yi) >- (x2,?/2) iff xi X2 V {xi A yi >-l^ 2/2)- 

Proposition 3.4 (Properties of well-founded relations). 

• A relation >- is well-founded iff 

V Q. {3 a e Q) — > (3 a,,„j„ e Q. (V a'. a,„i„ >~ a' — > a' ^ Q)) 

• Lei f be a function and >- a relation such that x >- y — > fx y' fy. If >-' is well-founded, 
then so is y. 

• If >- is well-founded, then so is 

• Let ^x be a well-founded relation on X and for each x ^ X let he >-y a well-founded 
relation. Then >-x (*lexP*) is well-founded. 

3.2. Logic of CNF formulae. 
Definition 3.5 (Basic types). 
Variable natural number 

Literal either a positive variable (Pos vbl) or a negative variable (Neg vhl) 

Clause a list of literals 

Formula a list of clauses 

Valuation a list of literals 

Trail a list of (Literal, bool) pairs 

For the sake of readability, we will sometimes omit types and use the following naming 
convention: literals (i.e., variables of the type Literal) are denoted by I (e.g., /, I' , Iq, li,l2, ■ ■ .), 
variables by vbl, clauses by c, formulae by F, valuations by v, and trails by M. 

Note that, in order to be closer to implementation (and to the standard solver input 
format — DIMACS), clauses and formulae are represented using lists instead of sets (a 
more detailed discussion on this issue is given in Section [9j) . Although a trail is not a list of 
literals (but rather a list of (Literal, bool) pairs), for simplicity, we will often identify it with 
its list of underlying literals, and we will treat trails as valuations. In addition, a trail can 
be implemented, not only as a list of (Literal, bool) pairs but in some other equivalent way. 
We abuse the notation and overload some symbols. For example, the symbol G denotes 
both set membership and list membership, and it is also used to denote that a literal occurs 
in a formula. Symbol vars is also overloaded and denotes the set of variables occurring in a 
clause, in a formula, or in a valuation. 

Definition 3.6 (Literals and clauses related). 



Note that lexicographic product can be regarded as a special case of parametrized lexicographic product 
(where a same is used for each x G X). 
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• The opposite literal of a literal I, denoted by I, is defined by: Pos vbl = Neg vbl, Neg vbl = 
Pos vbl. 

• A formula F contains a literal I (i.e., a literal I occurs in a formula F), denoted by / G -F, 
iS3c. cG F Al Gc. 

• The set of variables that occur in a clause c is denoted by vars c. The set of variables that 
occur in a formula F is denoted by vars F. The set of variables that occur in a valuation 
v is denoted by vars v. 

• The resolvent of clauses ci and C2 over the literal I, denoted 
resolvent ci C2 I is the clause (ci \ l)@{c2 \ I). 

• A clause c is a tautological clause, denoted by clauseTautology c, if it contains both a 
literal and its opposite (i.e., 3l.lGcAl€c). 

• The conversion of a valuation v to a formula is the list {v) that contains all single literal 
clauses made of literals from v. 

Definition 3.7 (Semantics). 

• A literal / is true in a valuation v, denoted hy v \= I, iS I G v. A clause c is true in a 
valuation v, denoted by v \= c, iS 31. I G c A v \= I. A formula F is true in a valuation v, 
denoted by v \= F, iff Vc. c € F =^ v 1= c. 

• A literal / is false in a valuation v, denoted by v \=^l, iSlGv. A clause c is false in a 
valuation v, denoted by v l=-i c, iff VL Z E c ^ w l=-i I. A formula F is false in a valuation 
v, denoted by v F , iS 3c. c G F A v l=-i cH 

• v!i^l{v^c/v!i^F) denotes that I (c / F) is not true in v (then we say that I {c / F) 
is unsatisfied m. v). v I {v i^-i c / v i^-i F) denotes that I {c / F) is not false in v (then 
we say that I {c / F) is unfalsified in v). 

Definition 3.8 (Valuations and models). 

• A valuation v is inconsistent, denoted by inconsistent v, iff it contains both a literal and 
its opposite i.e., iS 31. v \= I A v \=l. A valuation is consistent, denoted by (consistent v), 
iff it is not inconsistent. 

• A valuation v is total with respect to a variable set Vbl, denoted by total v Vbl, iff 
vars v D Vbl. 

• A model of a formula is a consistent valuation under which F is true. A formula F is 
satisfiable, denoted by sat F, iff it has a model, i.e., 3v. consistent v Av\= F. 

• A clause c is unit in a valuation v with a unit literal I, denoted by isUnit c / v iff / E c, 
v\^l,v^^l saidv (c \ I) (i.e., V/'. I' £cAl' ^l^v^^ I'). 

• A clause c is a reason for propagation of literal / in valuation v, denoted by isReason c / v 

I G c, V \= I, v {c \ I) , and for each literal T E (c \ /), the literal /' precedes I in v. 



Note that the symbol is atomic, i.e., u N-iF does not correspond to v \= (^-F), althougli it would be 
the case if all propositional formulae (instead of CNF only) were considered. 
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Definition 3.9 (Entailment and logical equivalence). 

• A formula F entails a clause c, denoted by F ^ c, iff c is true in every model of F. A 
formula F entails a literal I, denoted by F h Z, iff Z is true in every model of F. A formula 
F entails valuation v, denoted hy F \= v, iff it entails all its literals i.e., Vl. I £ v ^ F \= I. 
A formula Fi entails a formula F2 , denoted by Fi \= F2, if every model of Fi is a model 
of F2. 

• Formulae Fi and F2 are logically equivalent, denoted by Fi = F2, iff any model of Fi is a 
model of F2 and vice versa, i.e., iff -Fi 1= F2 and F2 \= Fi. 

Definition 3.10 (Trails related). 

• For a trail element a, element a denotes the first (Literal) component and isDecision a 
denotes the second (Boolean) component. For a trail M, elements M denotes the list of 
all its elements and decisions M denotes the list of all its marked elements (i.e., of all its 
decision literals). 

• The last decision literal, denoted by lastDecision M, is the last marked element of the list 
M, i.e., lastDecision M = last (decisions M). 

• decisionsTo M I is the list of all marked elements from a trail M that precede the 
first occurrence of the element I, including / if it is marked, i.e., decisionsTo I M = 
decisions (prefixTo I M). 

• The current level for a trail M, denoted by currentLevel M, is the number of marked 
literals in M, i.e., currentLevel M = length (decisions M). 

• The decision level of a literal / in a trail M, denoted by level I M, is the number of marked 
literals in the trail that precede the first occurrence of I, including / if it is marked, i.e., 
level I M = length (decisionsTo M I). 

• prefixToLevel M level is the prefix of a trail M containing all elements of M with levels 
less or equal to level. 

• The prefix before last decision, denoted by prefixBeforeLastDecision M, is a prefix of the 
trail M before its last marked element (not including it)0 

• The last asserted literal of a clause c, denoted by lastAssertedLiteral c M, is the literal 
from c that is in M, such that no other literal from c comes after it in M. 

• The maximal level of a literal in the clause c with respect to a trail M, denoted by 
maxLevel c M, is the maximum of all levels of literals from c asserted in M. 

Example 3.11. A trail M could be [+r, -2*^, +6*, +5*^, -3*, +4*, -7^]. The symbol + 
is written instead of the constructor Pos, the symbol — instead of Neg. decisions M = 
[-2'^, +5*^, -7'^], lastDecision M = -7, decisionsTo M +4 = [-2"^, +5*^], and decisionsTo M -7 
= [-2'^, +5'^, -7'^]. level +1 M = 0, level +4 M = 2, level -7 M = 3, currentLevel M = 3, 
prefixToLevel M 1 = +2'^, +6*]. If c is [+4, +6, -3], then lastAssertedLiteral c M = +4, 
and maxLevel c M = 2. 



Note that some of these functions are used only for some trails. For example, prefixBeforeLastDecision M 
makes sense only for trails that contain at least one decision literal. Nevertheless, these functions are still 
defined as total functions — for example, prefixBeforeLastDecision M equals M if there are no decision 
literals. 
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4. DPLL Search 

In this section we consider a basic transition system that contains only transition rules cor- 
responding to steps used in the original DPLL procedure: unit propagation, backtracking, 
and making decisions for branching (described informally in Section I2.lt the pure literal 
step is usually not used within modern SAT solvers, so it will be omitted). These rules will 
be defined in the form of relations over states, in terms of the logic described in Section 
[3l It will be proved that the system containing these rules is terminating, sound and com- 
plete. The rules within the system are not ordered and the system is sound, terminating, 
and complete regardless of any specific ordering. However, it will be obvious that better 
performance is obtained if making decisions is maximally postponed, in the hope that it 
will not be necessary. 

4.1. States and Rules. The state of the solver performing the basic DPLL search consists 
of the formula F being tested for satisfiability (that remains unchanged) and the trail M 
(that may change during the solver's operation). The only parameter to the solver is the 
set of variables Dec Vars used for branching. By Vars we will denote the set of all variables 
encountered during solving — these are the variables from the initial formula Fq and the 
decision variables DecVars, i.e., Vars = vars Fq U DecVars. 

Definition 4.1 (State). A state of the system is a pair (M, F), where M is a trail and F 
is a formula. A state ([], -Fo) is an initial state for the input formula Fq. 

Transition rules are introduced by the following definition, in the form of relations over 
states. 

Definition 4.2 (Transition rules). 

unitPropagate (Mi,Fi) (M2,-F2) iff 3c/. c e Fi A isUnit c Z Afi A 

M2 = Ml @ f A F2= Fi 

backtrack (Mi,Fi) (Af2,^2) iff Mi N-.Fi A decisions Mi 7^ [] A 

M2 = prefixBeforeLastDecision Mi @ lastDecision Afi' A F2 = Fi 

decide (Mi,Fi) (M2,F2) iff 31. var / e DecVars h I i Mi A I ^ Mi A 

M2 ^ Mi@l'^ A F2 = Fi 

As can be seen from the above definition (and in accordance with the description given in 
Section [2TT]) . the rule unitPropagate uses a unit clause — a clause with only one literal I 
undefined in Mi and with all other literals false in Mi. Such a clause can be true only if 
I is true, so this rule extends Mi by I (as an implied literal). The rule backtrack is applied 
when Fi is false in Mi. Then it is said that a conflict occurred, and clauses from Fi that 
are false in Mi are called conflict clauses. In that case, the last decision literal l'^ in Mi 
and all literals that succeed it are removed from Afi, and the obtained prefix is extended 
by r as an implied literal. The rule decide extends the trail by an arbitrary literal I as a 
decision literal, such that the variable of I belongs to DecVars and neither I nor I occur in 
Ml. In that case, we say there is a branching on I. 

The transition system considered is described by the relation -^^y introduced by the 
following definition. 
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Definition 4.3 (^d). 

si -^d S2 iff unitPropagate si S2 V backtrack si S2 V decide si S2 

Definition 4.4 (Outcome states). An outcome state is either an accepting state or a re- 
jecting state. 

A state is an accepting state if M ^ F and there is no state {M',F') such that 
decide {M,F) {M',F') (i.e., there is no literal such that var I G DecVars, I ^ M, and 
l^M). 

A state is a rejecting state if M \=-iF and decisions M = []. 

Note that the condition M >^-i F in the above definition can be replaced by the condition 
M \= F, but the former is used since its check can be more efficiently implemented. 

Example 4.5. Let Fq = [ [-l,+2], [-1, -3, +5, +7], [-1, -2, +5, -7], [-2, +3], [+2, +4], 
[—2, —5, +7], [—3, —6, —7], [—5, +6] ]. One possible -^^ trace is given below. 



rule 


M 


decide (1 — -^-^) 






U 

f+l'' 












unitPropagate (c = 


[-l,+2], I = 


= +2) 














unitPropagate (c = 


[-2,+3], / = 


= +3) 




+2' 


+3'] 








decide (/ = +4) 








+2* 


+3' 


+4<i] 






decide (/ = +5) 






i+i"' 


+2^ 


+3^ 


+4^ 






unitPropagate (c = 


[-5, +6], / = 


= +6) 


i+i"' 


+2' 


+3* 


+4^ 


+5'^ 


+6^] 


unitPropagate (c = 


[-2, -5, +7] 


, ^ = +7) 


i+i"* 


+2* 


+3' 


+4^ 




+6% +7^] 


backtrack (M [ 


-3,-6,-7]) 




[+1^ 


+2^ 


+3^ 


+4<^, 


-5^] 




unitPropagate (c = 


[-1,-3, +5 


+7], I = +7) 


[+1"^ 


+2' 


+3* 


+4^ 


-5\ 


+7^1 


backtrack (M [ 


-l,-2,+5,- 


■7]) 


i+i*^ 


+2' 


+3* 


-4'] 






decide {I = +5) 








+2^ 


+3* 


-A\ 


+5'^] 




unitPropagate (c = 


[-5, +6], / = 


= +6) 


[+1'^ 


+2\ 


+3^ 


-A\ 




+6'] 


unitPropagate (c = 


[-2, -5, +7] 


, ^ = +7) 


[+1^ 


+2* 


+y 


-A\ 


+5^ 




backtrack (M [ 


-3,-6,-7]) 






+2' 


+3* 


-\\ 


-5i 




unitPropagate (c = 


[-1,-3, +5 


+7], / = +7) 




+2* 


+3' 


-A\ 


-5^ 


+r] 


backtrack (M 1=^ [ 


-l,-2,+5,- 


■7] 


[-li 












decide (l = +2) 






[-r, 












unitPropagate (c = 


[-2,+3], / = 


= +3) 




+2'^ 


+3*] 








decide {I = +4) 






[-1% 


+2^ 


+3* 








decide {I = +5) 






[-V, 


+2'^ 


+3* 


+4^ 






unitPropagate (c = 


[-5,+6], / = 


= +6) 




+2"^ 


+3* 


+4^ 




+6^] 


unitPropagate (c = 


[-2, -5, +7] 


, ^ = +7) 


[-r, 


+2'^ 


+3' 






+6% +7*] 


backtrack M [- 


-3,-6,-7] 






+2'^ 


+3* 


+4^ 


-5*] 




decide (/ = +6) 






[-V, 


+2'^ 


+3* 


+4^ 


-5^ 


+6-^1 


unitPropagate (c = 


[-3, -6, -7] 


,l = -7) 




+2^ 


+3' 


+4^ 


-5\ 


+6<^, -T] 



4.2. Properties. In order to prove that the presented transition system is terminating, 
sound, and complete, first, local properties of the transition rules have to be given in the 
form of certain invariants. 
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4.2.1. Invariants. For proving properties of the described transition system, several rele- 
vant rule invariants will be used (not all of them are used for proving each of soundness, 
completeness, and termination, but we list them all here for the sake of simplicity). 

Invconsistenf- Consistent M 

Invdistinct- distinct M 

InVyarsM- vai's M C Vars 

InvimpiiedLits- ^l- I ^ M — > {F @ decisionsTo Z M) t= Z 

InVequiv- F = Fo 

InvyarsF- vars F C Vars 

The condition Invconsistent states that the trail M can potentially be a model of the 
formula, and Invdistinct requires that it contains no repeating elements. The InVimpn^dLits 
ensures that any literal Z in M is entailed by F with all decision literals that precede /. 

Notice that the given rules do not change formulae in states, so it trivially holds that 
F = Fq, which further implies Inv^quiv and InvyarsF- However, the transition systems 
that follow in the next sections may change formulae, so the above set of invariants is 
more appropriate. If only testing satisfiability is considered (and not in building models for 
satisfiable formulae), instead of InVequiv, it is sufficient to require that F and Fq are weakly 
equivalent (i.e., equisatisfiable) . 

The above conditions are indeed invariants (i.e., they are met for each state during the 
application of the rules), as stated by the following lemma. 

Lemma 4.6. 

(1) In the initial state ([],i^o) the invariants hold. 

(2) If{M,F) —^d {M',F') and if the invariants are met in the state {M,F), then they are 

met in the state {M',F') too. 

(3) //([],Fo) -^*^ {M,F), then all the invariants hold in the state {M,F). 

The proof of this lemma considers a number of cases — one for each rule-invariant pair. 



4.2.2. Soundness. Soundness of the given transition system requires that if the system ter- 
minates in an accepting state, then the input formula is satisfiable, and if the system 
terminates in a rejecting state, then the input formula is unsatisfiablc. 

The following lemma ensures soundness for satisfiable input formulae, and the next 
one is used for proving soundness for unsatisfiable input formulae (but also in some other 
contexts) . 

Lemma 4.7. If DecVars 3 vars Fq and if there is an accepting state {M,F) such that: 

(1) consistent M (i.e., Invconsistent holds), 

(2) F = Fq (i.e., InVequiv holds), 

(3) vars F C Vars (i.e., InvyarsF holds), 

then the formula Fq is satisfiable and M is one model (i.e., model M Fq). 
Lemma 4.8. If there is a state (M, F) such that: 

(1) yi. I e M — > {F @ decisionsTo I M) \= I (i.e., InvimpiiedLits holds), 

(2) M\=^F 

then -.(sat {F @ decisions M)). 

Theorem 4.9 (Soundness for -^d)- ^/([],^o) -^*d {M,F), then: 
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(1) If DecVars I) vars Fq and {M,F) is an accepting state, then the formula Fq satisfiable 
and M is one model (i.e., sat Fq and model M Fq). 

(2) If{M,F) is a rejecting state, then the formula Fq is unsatisfiable (i.e., -i(sat Fq)). 

Proof. By Lemma 14.61 all the invariants hold in the state (M, F) . 

Let us assume that DecVars ^ vars Fq and {M,F) is an accepting state. Then, by 
Lemma |4.7^ the formula is Fq satisfiable and M is one model. 

Let us assume that {M,F) is a rejecting state. Then M \=^F and, by Lemma 14.8^ 
-i(sat {F @ (decisions M))). Since {M,F) is a rejecting state, it holds that decisions M = [], 
and hence -i(sat F). From F = Fq {Invequiv)i it follows that -i(sat Fq), i.e., the formula Fq 
is unsatisfiable. □ 



4.2.3. Termination. Full and precise formalization of termination is very demanding, and 
termination proofs given in the literature (e.g., [KG071 [NUT06j ) are far from detailed formal 
proofs. For this reason, termination proofs will be presented here in more details, including 
auxiliary lemmas used to prove the termination theorem. 

The described transition system terminates, i.e., for any input formulae Fq, the system 
(starting from the initial state ([],-^o)) will reach a final state in a finite number of steps. 
In other words, the relation — >rf is well-founded. This can be proved by constructing a well- 
founded partial ordering >- over trails, such that (Mi,Fi) — >d (M2,F2) implies Mi >~ M2. 
In order to reach this goal, several auxiliary orderings are defined. 

First, a partial ordering over annotated literals >-\\t and a partial ordering over trails ;^tr 
will be introduced and some of their properties will be given within the following lemmas. 

Definition 4.10 ()^iit)- ^1 >-iit h ifi^ isDecision Zi A -i(isDecision I2) 

Lemma 4.11. ;^iit is transitive and irreflexive. 

Definition 4.12 ()^tr)' Mi :^tr iff Mi >-^^^ M2, where >-^^^ is a lexicographic extension 

of ^lit- 

Lemma 4.13. ;^tr is transitive, irreflexive, and acyclic (i.e., there is no trail M such that 
M ytr M). 

For any three trails M, M' , and M" it holds that: if M' ^tr M" , then M @ M' ^tr 
M @ M". 

The next lemma links relations — >d and ;^tr- 

Lemma 4.14. //decide (Mi,Fi) (M2,F2) or unitPropagate (Mi,Fi) (M2,F2) or backtrack 

(Mi,Fi) (M2,F2), then Mi Kr M2. 

The relation is not necessarily well-founded (for the elements of the trails range 
over infinite sets), so a restriction ^tAvu of the relation :^tr will be defined such that it is 
well-founded, which will lead to the termination proof for the system. 

Definition 4.15 {>-tr\vbi)- Mi >-tr\vbiM2 iff (distinct Mi A vars Mi C Vbl) A (distinct M2 A 

vars M2 C Vbl) A Mi ^tr M2 

Lemma 4.16. // the set Vbl is finite, then the relation ytrlvbi is a well-founded ordering. 
Finally, we prove that the transition system is terminating. 
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Theorem 4.17 (Termination for —>d)- If the set DecVars is finite, for any formula Fq, the 
relation -^^ is well-founded on the set of states {M,F) such that ([],Fq) — (M,F). 

Proof. By Proposition 13.41 it suffices to construct a well-founded ordering on the set of states 
(M,F) such that {[],Fo) ^* (M, F) such that 

(Mi,Fi)^rf(M2,F2) {M,,Fi)y{M2,F2). 

One such ordering is y defined by: (Mi,Fi) >- (M2,F2) iff Mi ytr\varsM2. 

Indeed, since by Lemma [4.16^ ^trl Vars is well-founded, by Proposition [33] (for a function 
mapping (M, F) to M) , >~ is also a well-founded ordering. 

Let (Ml, Fi) and (M2, F2) be two states such that i[],Fo) (Mi, Fi) and (Mi, Fi) 
(M2,F2). By Lemma SS] all the invariants hold for (Mi,Fi). From (Mi,Fi) -^^ (M2,F2), 
by Lemma 14.141 it follows that Mi ;^ti. M2. Moreover, by Lemma 14.61 all the invariants 
hold also for (M2,F2), so distinct Mi, vars Mi C Vars, distinct M2 and vars M2 C Vars. 
Ultimately, Mi )^tr|varsM2. □ 

4.2.4. Completeness. Completeness requires that all final states are outcome states. 

Theorem 4.18 (Completeness for — t-^). Each final state is either accepting or rejecting. 

Proof Let (M,F) be a final state. It holds that either M h^F or Mi^^F. 

If M J^^ F, since there is no state (M', F') such that decide (M, F) (M', F') (as (M, F) 
is a final state), there is no literal / such that var / € DecVars, I ^ M, and I ^ M, so (M, F) 
is an accepting state. 

If M h^F, since there is no state (M',F') such that backtrack (M, F) {M',F') (as 
(M, F) is a final state), it holds that decisions M = [], so (M, F) is a rejecting state. □ 

Notice that from the proof it is clear that the basic search system consisting only of 
the rules decide and backtrack is complete. 

4.2.5. Correctness. The theorems 14.91 14.171 ar id 14.181 directly lead to the theorem about 
correctness of the introduced transition system]^ 

Theorem 4.19 (Correctness for -^d)- The given transition system is correct, i.e., if all 
variables of the input formula belong to the set DecVars, then for any satisfiable input 
formula, the system terminates in an accepting state, and for any unsatisfiable formula, the 
system terminates in a rejecting state. 

5. Backjumping 

In this section, we consider a transition system that replaces naive chronological backtrack- 
ing by more advanced nonchronological backjumping. 



Correctness of the system can be proved with a weaker condition. Namely, instead of the condition 
that all variables of the input formula belong to the set DecVars, it is sufficient that all strong backdoor 
variables belong to DecVars [BHMWOQ] . but that weaker condition is not considered here. 
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5.1. States and Rules. The rules of the new system are given (as in Section H]) in the 
form of relations over states. 

Definition 5.1 (Transition rules). 
unitPropagate (Mi,Fi) (M2,F2) iff 

3c I. Fi 1= c A var I e Vars A isUnit c I Mi A 
Ma = Ml @ r A F2=Fi 

backjump {Mi,Fi) (^^2,-^2) iff 

3 c I P level. Fi N c A var Z e Vars A 

P = prefixToLevel level M A < level < currentLevel AI A 
isUnit c I P A 

F2 = Fi A M2 = p @ r 

decide {Mi,Fi) (Af2,F2) iff 3/. var I e DecVars A I Mi A J ^ Mi A 

M2 = Ml ©l"^ A F2 = Fi 

In the following, the transition system described by the relation -^h defined by these 
rules will be considered. 

The key difference between the new transition system and one built over the rules given 
in Definition 14.21 is the rule backjump (that replaces the rule backtrack). The rule decide is 
the same as the one given in Definition 14.21 while the rule unitPropagate is slightly modified 
(i.e., its guard is relaxed). 

The clause c in the backjump rule is called a backjump clause and the level level is called 
a backjump level. The given definition of the backjump rule is very general — it does not 
specify how the backjump clause c is constructed and what prefix P (i.e., the level level) is 
chosen if there are several options. There are different strategies that specify these choices 
and they are required for concrete implementations. The conditions that P is a prefix to a 
level (i.e., that P is followed by a decision literal in Mi) and that this level is smaller than 
the current level are important only for termination. Soundness can be proved even with 
a weaker assumption that P is an arbitrary prefix of Mi. However, usually the shortest 
possible prefix P is taken. The backtrack rule can be seen as a special case of the backjump 
rule. In that special case, the clause c is built of opposites of all decision literals in the trail 
and P becomes prefixBeforeLastDecision Mi. 

Notice that the backjump clause c does not necessarily belong to Fi but can be an 
arbitrary logical consequence of it. So, instead of c E Fi, weaker conditions Fi \= c and 
var / S Vars are used in the backjump rule (the latter condition is important only for 
termination). This weaker condition (inspired by the use of SAT engines in SMT solvers) 
can be used also for the unitPropagate rule and leads from the rule given in Definition 14.21 to 
its present version (this change is not relevant for the system correctness). The new version 
of unitPropagate has much similarities with the backjump rule — the only difference is that 
the backjump rule always asserts the implied literal to a proper prefix of the trail. 

Example 5.2. Let Fq be the same formula as in Example 14.51 One possible — trace is 
given below. Note that, unlike in the trace shown in Example 14.51 the decision literal +4 
is removed from the trail during backjumping, since it was detected to be irrelevant for the 
conflict, resulting in a shorter trace. The deduction of backjump clauses (e.g., [—2, —3, —5]) 
will be presented in Example 17.41 
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rule 
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hppkiiimn (r — [ — 11 


/ - -11 






[— 1*] 




decide (/ = +2) 








[-r, 


+2^^] 


unitPropagate (c = [- 


-2,+3], / = 


+3) 






+2^+3^] 


decide (/ = +4) 








[-r 


+2^+3\+4'='] 


decide (/ = +5) 








[-r 


+2^ +3S +4^ +5^^] 


unitPropagate (c 


= [-5, +6], 


/ = +6) 




[-r 


+2^,+3S+4'^,+5^+6^] 


unitPropagate (c 


= [-2,-5, 


+7],/ = 


+7) 


[-r 


+2-^, +3S +4'^, +5^ +6\ +r] 


backjump (c = [- 


-2,-3,-5]) 






[-r 


+2^,+3S-5^] 


decide (/ = +4) 








[-r 




decide {I = +6) 








[-r 


+2'^,+3S-5S+4'^,+6^] 


unitPropagate (c 


= [-3,-6, 


-7]J = 


-7) 


[-r 


+2*^, +3S -5% +4"^, +6^, -7^] 



5.2. Backjump Levels. In Definition 15.11 for tlie backjump rule to be applicable, it is 
required that there is a level of the trail such that the backjump clause is unit in the prefix 
to that level. The following definition gives a stronger condition (used in modern SAT 
solvers) for a level ensuring applicability of the backjump rule to that level. 

Definition 5.3 (Backjump level). A backjump level ioi the given backjump clause c (false 
in M) is a level level that is strictly less than the level of the last falsified literal from c, 
and greater or equal to the levels of the remaining literals from c: 

isBackjumpLevel ZeweZ Z c M iff M\=^c A Z = lastAssertedLiteral c M A 

< level < level I M A 

V I'.l' ec\l — > level V M < level 

Using this definition, the backjump rule can be defined in a more concrete and more 
operational way. 

backjump' (Mi,Fi) {M2,F2) iff 3c I level. Fi'f c A var 7 e Vars A 

isBackJumpLevel 7ewe7 I c Mi A 
M2 = (prefixTo Level 7e?;eZ Mi) @ 7' A F2 = Fi 

Notice that, unlike in Definition 15. 1|, it is required that the backjump clause is false, so 
this new rule is applicable only in conflict situations. 

It still remains unspecified how the clause c is constructed. Also, it is required to check 
whether the clause c is false in the current trail M and implied by the current formula F. In 
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Section [7] it will be shown that if a clause c is built during a conflict analysis process, these 
conditions will hold by construction and so it will not be necessary to check them explicitly. 
Calculating the level of each literal from c (required for the backjump level condition) will 
also be avoided. 

The following lemmas connect the backjump and backjump' rules. 
Lemma 5.4. If: 

(1) consistent M (i.e., Invconsistent holds), 

(2) unique M (i.e., Invunique holds), 

(3) isBackjumpLevel level I c M, 

then isUnit c I (prefixToLevel level M). 

Lemma 5.5. If a state {M,F) satisfies the invariants and i/ backjump' {M,F) {M',F'), 
then backjump (M,F) {M',F'). 

Because of the very close connection between the relations backjump and backjump', 
we will not explicitly define two different transition relations —^b- Most of the correctness 
arguments apply to both these relations, and hence only differences will be emphasized. 

Although there are typically many levels satisfying the backjump level condition, (i.e., 
backjumping can be applied for each level between the level of the last falsified literal from 
c and the levels of the remaining literals from c) , usually it is applied to the lowest possible 
level, i.e., to the level that is a backjump level such that there is no smaller level that is 
also a backjump level. The following definition introduces formally the notion of a minimal 
backjump level. 

Definition 5.6 (isMinimalBackjumpLevel). isMinimalBackjumpLevel level I c M iS 

isBackjumpLevel level I c M A (V level' < level. -lisBackjumpLevel level' I c M) 

Although most solvers use minimal levels when backjumping, this will be formally 
required only for systems introduced in Section [H 

5.3. Properties. As in Section HI local properties of the transition rules in the form of 
certain invariants are used in proving properties of the transition system. 

5.3.1. Invariants. The invariants required for proving soundness, termination, and com- 
pleteness of the new system are the same as the invariants listed in Section [H So, it is 
required to prove that the rules backjump and the modified unitPropagate preserve all the 
invariants. Therefore, Lemma 14.61 has to be updated to address new rules and its proof has 
to be modified to reflect the changes in the definition of the transition relation. 

5.3.2. Soundness and Termination. The soundness theorem (Theorem 14. 9p has to be up- 
dated to address the new rules, but its proof remains analogous to the one given in Section 

H 

The termination theorem (Theorem I4.17P also has to be updated, and its proof again 
remains analogous to the one given in[31 However, in addition to Lemma [4.14l the following 
lemma has to be used. 



Lemma 5.7. //backjump (Afi,Fi) (M2,F2), then Mi ^tr^2- 
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This proof relies on the following property of the relation ;^tr- 

Lemma 5.8. // M is a trail and P = prefixToLevel level M , such that < level < 
currentLevel M, then M ^tr P @ 

5.3.3. Completeness and Correctness. Completeness of the system is proved partly in anal- 
ogy with the completeness proof of the system described in Section HI given in Theorem 
14.181 When (M, F) is a final state and M F, the proof remains the same as for Theorem 
14.181 When (M, F) is a final state and M l=-i F, for the new system it is not trivial that this 
state is a rejecting state (i.e., it is not trivial that decisions M = []). Therefore, it has to 
be proved, given that the invariants hold, that if backjumping is not applicable in a conflict 
situation (when M \=^F), then decisions M = [] (i.e., if decisions M / [], then backjump' is 
applicable, and so is backjump). The proof relies on the fact that a backjump clause may be 
constructed only of all decision literals. This is the simplest way to construct a backjump 
clause c and in this case backjumping degenerates to backtracking. The clause c constructed 
in this way meets sufficient (but, of course, not necessary) conditions for the applicability 
of backjump' (and, consequently, by Lemma [531 for the applicability of backjump). 

Lemma 5.9. If for a state {M,F) it holds that: 

(1) consistent M (i.e., Invconsistent holds), 

(2) unique M (i.e., Invunique holds), 

(3) Ml. I ^ M — > F @ (decisionsTo / M) 1= / (i.e., InVimphedLUs holds), 

(4) vars M C Vars (i.e., Inv^arsM holds), 

(5) MN^F, 

(6) decisions M 7^ [], 

then there is a state {M',F') such that backjump' {M,F) {M',F'). 

To ensure applicability of Lemma 15.91 the new version of the completeness theorem 
(Theorem l4.18p requires that the invariants hold in the current state. Since, by Lemma [5. 5 [ 
backjump' {M,F) {M',F') implies backjump {M,F) {M',F'), the following completeness 
theorem holds for both transition systems presented in this section (using the rule backjump' 
or the rule backjump). 

Theorem 5.10 (Completeness for -^i,). //([], i<b) -^l {M,F), and {M,F) is a final state, 
then {M, F) is either accepting or rejecting. 

Proof. Let {M,F) be a final state. By Lemma 14.61 a-H invariants hold in (M,F). Also, it 
holds that either M \=^F or M!i^^ F. 

If M i^-i F, since decide is not applicable, (M, F) is an accepting state. 

If M N-iF, assume that decisions M / []. By Lemma \5.9\ there is a state {M',F') 
such that backjump' (M, F) (M', F'). This contradicts the assumption that (M, F) is a final 
state. Therefore, decisions M = [], and since M \=-'F, {M,F) is a rejecting state. □ 

Correctness of the system is a consequence of soundness, termination, and completeness, 
in analogy with Theorem 14.191 
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6. Learning and Forgetting 

In this section we briefly describe a system obtained from the system introduced in Section [5] 
by adding two new transition rules. These rules will have a significant role in more complex 
systems discussed in the following sections. 

6.1. States and Rules. The relation -^^ introduced in Section [5] is extended by the two 
following transition rules (introduced in the form of relations over states). 

Definition 6.1 (Transition rules). 

learn {Mi,Fi) {M2,F2) iff 3c. i^i N c A vars c C Vars A 

F2^Fi@ c A A'h = Ml 

forget (Ml, i^i) (A/2, F2) iff 3 c . Fi\c\^ c ^ 

F2^Fi\c A M2 = Ml 

The extended transition system will be denoted by — 

The learn rule is defined very generally. It is not specified how to construct the clause c 
— typically, only clauses resulting from the conflict analysis process (Section [7]) are learnt. 
This is the only rule so far that changes F, but the condition F \= c ensures that it always 
remains logically equivalent to the initial formula Fq. The condition vars c C Vars is relevant 
only for ensuring termination. 

The forget rule changes the formula by removing a clause that is implied by all other 
clauses (i.e., is redundant). It is also not specifled how this clause c is chosen. 

Example 6.2. Let Fq be a formula from Example 14.51 A possible -^i trace is given by 
(note that, unlike in the trace shown in Example 15.21 a clause [—1,-2,— 3] is learnt and 
used afterwards for unit propagation in another part of the search tree, eventually leading 
to a shorter trace): 



rule 


M 


F 








[] 


Fo 








decide (/ = +1), 








Fo 








unitPropagate (c 




-l,+2], / = +2) 


[+1^+2'] 


Fo 








unitPropagate (c 




-2, +3], I = +3) 


[+l^+2S+3^] 


Fo 








decide (/ = +4) 






[+l'^,+2\+3\+¥] 
[+1^ +2S +3\ +4'', +5^*] 


Fo 








decide (/ = +5) 






Fo 








unitPropagate (c 




-5, +6], / = +6) 


[+l^+2',+3\+4'',+5'*,+6^] 


Fo 








unitPropagate (c 




-2, -5, +7], / = +7) 


[+1^ +2\+y, +4'', +5^*, +6', +7'] 


Fo 








backjump (c = [- 


-2, 


-3,-5], / = -5) 


[+1^ +2S +3^ -5'] 


Fo 








learn (c = [-2, - 


3, 


-5]) 


[+l'^,+2\+3\-5'] 


Fo© 


[-2, 


-3, 


-5] 


unitPropagate (c 




-1,-3, +5, +7], / = +7) 


[+l'^,+2\+3\-5\+7'] 


Fo<Q 


[-2, 


-3, 


-5] 


backjump (c = [- 


-1] 


/ = -!) 


[-V] 


Fo© 


[-2, 


-3, 


-5] 


decide {I = +2) 






[-r,+2''] 


Fo® 


[-2, 


-3, 


-5] 


unitPropagate (c 




-2, +3], / = +3) 


[-r,+2'',+3'] 


Fo© 


[-2, 


-3, 


-5] 


unitPropagate (c 




-2,-3,-5], / = -5) 


[-V,+2'^,+i\-5'] 


Fo© 


[-2, 


-3, 


-5] 


decide {I = +4) 






[-^,+2'',+3^-5^+4'*] 


Fo© 


[-2, 


-3, 


-5] 


decide (/ = +6) 






[-r, +2'', +3S -5% +4^*, +6'^] 


Fo© 


[-2, 


-3, 


-5] 


unitPropagate (c 


= 


-3,-6,-7], / = -7) 


l-r, +2'^, +3\ -5\ +4'*, +6^ -r] 


Fo© 


[-2, 


-3, 


-5] 
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6.2. Properties. The new set of rules preserves all the invariants given in Section 14.2.11 
Indeed, since learn and forget do not change the trail M, all invariants about the trail 
itself are trivially preserved by these rules. It can be proved that Invequiv, Inv^arsF and 
InVimpiiedLiterais also hold for the new rules. 

Since the invariants are preserved in the new system, soundness is proved as in Theorem 
14.91 Completeness trivially holds, since introducing new rules to a complete system cannot 
compromise its completeness. However, the extended system is not terminating since the 
learn and forget rules can by cyclically applied. Termination could be ensured with some 
additional restrictions. Specific learning, forgetting and backjumping strategies that ensure 
termination will be defined and discussed in Sections [7] and [HI 

7. Conflict Analysis 

The backjumping rules, as defined in Section [H are very general. If backjump clauses 
faithfully reflect the current conflict, they typically lead to significant pruning of the search 
space. In this section we will consider a transition system that employs confiict analysis in 
order to construct backjump clauses, which can be (in addition) immediately learned (by 
the rule learn). 

7.1. States and Rules. The system with conflict analysis requires extending the definition 
of state introduced in Section HI 

Definition 7.1 (State). A state of the system is a four-tuple {M,F,C, cflct), where M is 
a trail, F is a formula, C is a clause, and cflct is a Boolean variable. A state {[],Fq, [], ^) 
is an initial state for the input formula Fq. 

Two new transition rules conflict and explain are defined in the form of relations over 
states. In addition, the existing rules are updated to map four-tuple states to four-tuple 
states. 

Definition 7.2 (Transition rules). 

decide (Afi, Fi, Ci, cflct^) (A/2, i^2, C2, cflct^) iff 

31. var I G DecVars A / ^ Mi A I ^ Mi A 

M2 = Ml A F2 ^ Fi A C2 ^ Ci A cflct 2 ^ cflct ^ 

unitPropagate {Mi,Fi,Ci, cflct^) (M2, F2, C2, c^cfj) iff 

3c I. Fi N c A var I G vars Vars A isUnit c I Mi A 

M2 = Ml @P A F2 = Fi A C2 ^ Ci A cflct^ = cflct^ 

conflict {Mi,Fi,Ci, cflct{) (A/2, F2, C2, cflct^) iff 

3c. cflct^ = L A Fi N c A Mi N-.cA 

A/2 = A/i A F2 = Fi A C2 = c A cflct^ = T 
explain {Mi, Fi,Ci, cflct ^) {M2, F2,C2, cflct^) iff 

3 I c. cflct^ = T A ; e Ci A isReason cJ Mi A Fi N c A 

A/2 = Ml A F2 = Fi A C2 = resolve Ci c / A cflct^ = T 
backjump {Mi, Fi,Cu cflct^) {M2, F2,C2, cflct^) iff 
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3^ level. cflcti = T A isBackjumpLevel level I Ci Mi A 
M2 = (prefixToLevel level Mi)@f A F2 = i^i A 
C2 = [] A cflct^ = -L 
learn {Mi, Fi,Ci, cflct i) {M^, F2,C2, cflct^) iff 
cflcti = T A Ci ^ Fi 

M2 = Ml A F2 ^ Fi @ Ci A C2 ^ Ci A cflct2 = cflcti 

The relation — >c is defined as in Definition 14. 3^ but using the above hst of rules. The 
definition of outcome states also has to be updated. 

Definition 7.3 (Outcome states). A state is an accepting state if cflct = _L, MJ^-iF and 
there is no literal such that var I € DecVars, I ^ M and / ^ M. 
A state is a rejecting state if cflct = T and C = []. 

Example 7.4. Let Fq be a formula from Example 14.51 A possible trace (shown up to 
the first application of backjump) is given (due to the lack of space, the F component of the 
state is not shown). 



rule 


M 


cflct 


C 


decide (/ = +1), 




[] 

[+11 














unitPropagate (c = [- 


-l,+2], / = +2) 


[+1^+2 


] 




_L 








unitPropagate (c = [- 


-2, +3], / = +3) 


[+1^+2 


,+3 


] 


_L 








decide (/ = +4) 




[+1^+2 


,+3 


,+4^+5''] 


_L 








decide (/ = +5) 




[+1^+2 


,+3 


_L 








unitPropagate (c = [- 


-5, +6], / = +6) 


[+1^+2 


,+3 


, +4^ +5'', +6'] 


_L 








unitPropagate (c = [- 


-2, -5, +7], / = +7) 


[+1^+2 


,+3 


,+4'',+5'*,+6S+7^] 


_L 








conflict (c = [-3, —6, 


-7]) 


[+1^+2 


,+3 


,+4^+5'',+6',+7i 


T 


[-3, 


-6, 


-7] 


explain {I = —7, c = 


[-2, -5, +7]) 


[+1^+2 


,+3 


, +4^ +5^ +6% +7^] 


T 


[-2, 


-3, 


-5,-6] 


explain {I = —6, c = 


[-5, +6]) 


[+1^+2 


,+3 


, +4^ +5^ +6% +7^ 


T 


[-2, 


-3, 


-5] 


learn (c = [-2,-3,- 


5]) 


[+1^+2 


,+3 


, +4^ +5^ +6% +7^ 


T 


[-2, 


-3, 


-5] 


backjunnp (c = [—2, - 


3,-5], / = -5) 




_L 


[] 







7.2. Unique Implication Points (UIP). SAT solvers employ different strategies for con- 
flict analysis. The most widely used is a 1-UIP strategy, relying on a concept of unique 
implication points (UIP) (often expressed in terms of implication graphs [MSS99j ). Infor- 
mally, a clause c, false in the trail M, satisfies the UIP condition if there is exactly one 
literal in c that is on the highest decision level of M. The UIP condition is very easy to 
check. The 1-UIP strategy requires that the rule explain is always applied to the last literal 
false in M among literals from c, and that backjumping is applied as soon as c satisfies the 
UIP condition. 

Definition 7.5 (Unique implication point). A clause c that is false in M has a unique 
implication point, denoted by isUIP I c M, if the level of the last literal / from c that is false 
in M is strictly greater than the level of the remaining literals from c that are false in M: 

isUIP /cM iff Ml=-.c A I = lastAssertedLiteral c M A 
y I'. I' (£c\l — > level F M < level I M 
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The following lemma shows that, if there are decision literals in M, if a clause has a 
unique implication point, then there is a corresponding backjump level, and consequently, 
the backjump rule is applicable. 

Lemma 7.6. //unique M (i.e., Invunique holds), then 

isUIP I c M A level Z Af > iff 3 level. isBackjumpLevel level I c M 

Therefore, the guard isBackjumpLevel level I c M in the definition of the backjump rule 
can be replaced by the stronger conditions isUIP / c M and level I M > 0. In that case, the 
backjump level level has to be explicitly calculated (as in the proof of the previous lemma). 

The UIP condition is trivially satisfied when the clause c consists only of opposites of 
decision literals from the trail (a similar construction of c was already used in the proof of 
Lemma 15. 9p . 

Lemma 7.7. // it holds that: 

(1) unique M (i.e., Invunique holds), 

(2) c C decisions M , 

(3) I = lastAssertedLiteral c M, 
then isUIP / c M. 

7.3. Properties. Properties of the new transition system will be again proved using in- 
variants introduced in Section |H but they have to be updated to reflect the new definition 
of states. In addition, three new invariants will be used. 

7.3.1. Invariants. In addition to the invariants from Section HI three new invariants are 
used. 

InVC false- Cflct > M C 

InVCentailed- cflct > F \= C 

InVreasonCiauses- V L / G M A I ^ decisions M — > 3 c. isReason c I M A F ^ c 

The first two invariants ensure that during the confiict analysis process, the conflict 
analysis clause C is a consequence of F and that C is false in M. The third invariant 
ensures existence of clauses that are reasons of literal propagation (these clauses enable 
apphcation of the explain rule). By the rules unitPropagate and backjump literals are added 
to M only as implied literals and in both cases propagation is performed using a clause 
that is a reason for propagation, so this clause can be associated to the implied literal, and 
afterwards used as its reason. 

Lemma HTGl again has to be updated to address new rules and its proof has to be modified 
to refiect the changes in the definition of the relation — >-c. 

7.3.2. Soundness. Although the soundness proof for unsatisfiable formulae could be again 
based on Lemma 14.81 this time it will be proved in an alternative, simpler way (that does 
not rely on the invariant InvimpUedLits) ■, that was not possible in previous sections. 

Lemma 7.8. // there is a rejecting state {M, F, C, cflct) such that it holds 

(1) F = Fo, (i.e., Invequiv holds) 

(2) cflct — > F\=C (i.e., Invcentaiied holds), 
then Fq is unsatisfiable (i.e., -i(sat Fq)). 
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Theorem 7.9 (Soundness for ^c)- //([], Fq, [], _L) ^* {M, F,C, cflct), then: 

(1) If DecVars ^ vars Fq and {M,F) is an accepting state, then the formula is Fq satisfiable 
and M is its model (i.e., sat Fq and model Af Fq). 

(2) If{M, F, C, cflct) is a rejecting state, then the formula Fq is unsatisfiable (i.e., -i(sat Fq) ). 

Proof. By Lemma 14.61 all the invariants hold in the state {M,F,C, cfict). 

(1) All conditions of Lemma 14.71 are met (adapted to the new defintion of state), so sat Fq 
and model M Fq. 

(2) All conditions of Lemma 17.81 are met, so -i(sat Fq). CH 

7.3.3. Termination. Termination of the system with conflict analysis will be proved by using 
a suitable well-founded ordering that is compatible with the relation — )-c, i.e., an ordering >- 
such that s — )-c s' yields s >- s', for any two states s and s'. This ordering will be constructed 
as a lexicographic combination of four simpler orderings, one for each state component. 

The rules decide, unitPropagate, and backjump change M and no other state components. 
If a state s is in one of these relations with the state s' then M ^trl VarsM' (for the ordering 
^tr\ Vars, introduced in Section [4. 2. 3p . 

The ordering ;^tr| Vars cannot be used alone for proving termination of the system, since 
the rules conflict, explain, and learn do not change M (and, hence, if a state s is transformed 
into a state s' by one of these rules, then it does not hold that M ^trlvarsM'). For each of 
these rules, a specific well-founded ordering will be constructed and it will be proved that 
these rules decrease state components with respect to those orderings. 

The ordering ;^booi will be used for handling the state component cflct and the rule 
conflict (the rule explain changes the state component cfict, but also the state component 
C, so it will be handled by another ordering). Given properties of the ordering ;^booi are 
proved trivially. 

Definition 7.10 (^booi)- h ^booi 62 iff &i = -L A 62 = T. 

Lemma 7.11. //conflict {Mi, Fi,Ci, cflct-^) {M2, F2,C2, cflct2), then cflct^ >-booi cflct2. 

Lemma 7.12. The ordering )^booi is well-founded. 

An ordering over clauses (that are the third component of the states) should be con- 
structed such that the rule explain decreases the state component C with respect to that 
ordering. Informally, after each application of the rule explain, a literal / of the clause C 
that is (by Invc false) false in M is replaced by several other literals that are again false in 
M, but for them it holds that their opposite literals precede the literal 7 in M (since reason 
clauses are used). Therefore, the ordering of literals in the trail M defines an ordering of 
clauses false in M. The ordering over clauses will be a multiset extension of the relation -<m 
induced by the ordering of literals in M (Definition 13. ip . Each explanation step removes a 
literal from C and replaces it with several literals that precede it in M. To avoid multiple 
occurrences of a literal in C, duplicates are removed. Solvers usually perform this operation 
explicitly and maintain the condition that C does not contain duplicates. However, our 
ordering does not require this restriction and termination is ensured even without it. 

Definition 7.13 {^^J. For a trail M, Ci C2 iff (remDups C^) -<^"i* (remDups CT). 

Lemma 7.14. For any trail M, the ordering ^Qg^ is well-founded. 
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The following lemma ensures that each explanation step decreases the conflict clause in 
the ordering I^Qg^, for the current trail M. This ensures that each application of the explain 
rule decreases the state with respect to this ordering. 

Lemma 7.15. // / € C and isReason c I M, then C 1^^^ ''^solve Cel. 

Lemma 7.16. //explain {M , F, Ci , cflct) {M, F,C2, cflct), then Ci ^-cL^'s- 

The rule learn changes the state component F (i.e., it adds a clause to the formula) and 
it requires constructing an ordering over formulae. 

Definition 7.17 For any clause C, Fi ^g,^^ Fa iff C ^ Fi A C E Fg. 

Lemma 7.18. For any clause C , the ordering ^poj-m ^■^ well-founded. 

By the definition of the learn rule, it holds that C ^ F\ and C G F2, so the following 
lemma trivially holds. 

Lemma 7.19. // learn (M, Fi, C, cflct) {M, F2, C, cflct), then Fi >-^^,^ F2. 

Theorem 7.20 (Termination for — )-c). If the set DecVars is finite, for any formula Fq, the 
relation — t-c is well-founded on the set of states s such that sq — )•* s, where sq is the initial 
state for Fq . 

Proof. Let ;^ be a (parametrized) lexicographic product (Definition!?]), i.e., let 

y = ^trl Kar5 (*lex*) ^booi (*lexP*) (Xs.^^A (*lexP*) (Xs.^^' 



Clay V^''- '^Formy 

where Mg is the trail in the state s, and Cs is the conflict clause in the state s. By Proposition 
13.41 and Lemmas 14.161 17.121 17.141 and 17.181 the relation >- is well-founded. If the invari- 
ants hold in the state (Mi, Fi, Ci, cfict-^) and if (Mi, Fi, Ci, cflct^) (M2, F2, C2, cflct^), 
then (Ml, c/ilcii, Ci, Fi) ^ {M2, cfict2,C2, F2). Indeed, by Lemma [3.141 the rules decide, 
unitPropagate and backjump decrease M in the ordering, the rule conflict does not change 
M but (by Lemma |7. lip decreases cflct, the rule explain does not change M nor cflct, but 
(by Lemma l7.16p decreases C, and the rule learn does not change M, cflct, nor C, but (by 
Lemma l7.19p decreases F. 

Then the theorem holds by Proposition 13.41 (where f is a permutation mapping (M, F, 
C, cflct) to (M, cfict, C, F)). □ 

7.3.4. Completeness and Correctness. Completeness requires that all final states are out- 
come states, and the following two lemmas are used to prove this property. 

Lemma 7.21. If for the state {M,F,C, cfict) it holds that: 

(1) cflct = T, 

(2) unique M (i.e., Invunique holds), 

(3) cflct — > M\=^C (i.e., Invcfaise holds), 

(4) the rules explain and backjump are not applicable, 

then the state {M,F,C, cflct) is a rejecting state and C = []. 

Lemma 7.22. If in the state {M,F,C, cflct) it holds that cflct = _L and the rule conflict is 
not applicable, then the state {M,F,C, cflct) is an accepting state and M li^^F. 

Theorem 7.23 (Completeness for -)-c). For any formula Fq, z/([], Fq, [], -L) -^l (M, F, C, cflc 
and if the state (M, F, C, cflct) is final, then it is either accepting or rejecting. 
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Proof. Since the state {M,F,C, cflct) is reachable from the initial state, by Lemma 14.61 
the invariants hold in this state, including unique M (i.e., Invunique), and cflct — > M l=-i C 
(i.e., Invc false)- In the state {M,F,C, cflct), it holds that either cflct = T or cflct = _L. If 
cflct = _L, since the rule decide is not applicable (as the state is final), by Lemma [Y. 221 the 
state (M, F, C, cflct) is a rejecting state. If cflct = T, since the rule conflict is not applicable 
(as the state is final) by Lemma 17.211 the state is an accepting state. □ 

Correctness of the system is proved in analogy with Theorem 14.191 

8. Restarting and Forgetting 

In this section we extend the previous system with restarting and forgetting. The most 
challenging task with restarting is to ensure termination. 

Many solvers use restarting and forgetting schemes that apply restarting with in- 
creasing periodicity and there are theoretical results ensuring total correctness of these 
|KG071 INOT0 6]. However, modern solvers also use aggressive restarting schemes (e.g., 
Luby restarts) that apply the restart rule very frequently, but there are no corresponding 
theoretical results that ensure termination of these schemes. In this section we will for- 
mulate a system that allows application of the restart rule after each conflict and show 
that this (weakly constrained, hence potentially extremely frequent) scheme also ensures 
termination. 

8.1. States and Rules. Unlike previous systems that tend to be as abstract as possible, 
this system aims to precisely describe the behaviour of modern SAT solvers. For example, 
only learnt clauses can be forgotten. So, to aid the forget rule, the formula is split to the 
initial part Fq and the learnt clauses Fl. Since the input formula Fq is fixed it is not a 
part of state anymore, but rather an input parameter. The new component of the state 
— the Int flag — has a role in ensuring termination by preventing applying restart and 
forget twice without learning a clause in between. In addition, some changes in the rules 
ensure termination of some variants of the system. Unit propagation is performed eagerly, 
i.e., decide is not applied when there is a unit clause present. Also, backjumping is always 
performed to the minimal backjump level (Definition 15. 2p . These stronger conditions are 
very often obeyed in real SAT solver implementations, and so this system still makes their 
faithful model. 

Definition 8.1 (State). A state of the system is a five-tuple (M, Fl, C, cflct, Int), where M 
is a trail, Fl is a formula, C is a clause, and cflct and Int are Boolean variables. A state 
([], Fq, [], _L, _L) is a initial state for the input formula Fq. 

Definition 8.2 (Transition rules). 

decide (Mi, F/i, Ci, cflct^, Inti) (A/a, Fh, Ci, cflct^, lnt2) iff 
31. var / e DecVars A I ^ Mi A J ^ Mi A 

-^{3 cl. ce Fq@FIi a isUnitClause c I IVli) A 

Ma ^ Mi@f A FI2 = Fh A C2 = Ci A cflct^ = cflcti A lnt2 = Inti 

unitPropagate [Mi, Fli, Ci, cflct^, Inti) {M2, FI2, C2, c/lctj, lnt2) iff 
3cl. ceFo@F/i A isUnitc^Mi A 

M2 = Ml @f A FI2 = Fh A C2 = Ci A cflct2 = cflcti A lnt2 = Inti 



FORMALIZATION OF ABSTRACT STATE TRANSITION SYSTEMS FOR SAT 



31 



conflict {Mi,Fli,Ci, cflct^, Inti) (A/2, ^^2, C2, cflct^, lnt2) iff 
3c. cflct^ = ± A ceFo@Fli A AfiN-.cA 

M2 = Ml A f ^2 = i^^i A C2 = c A cflct2 = T A /nt2 = Inti 
explain (A/i, i^/i, Ci, c/Zc^i, Inti) (A/2, /'/2, C2, c/j!rf2, ^"^2) iff 

3 Z c. cflct^ = T A Z e Ci A isReason mcl Mi A ce Fq@ Fh A 

A/2 = A/i A FI2 = F^i A C2 = resolve Ci cl A cflct^ = T A /nt2 = Inh 

backjumpLearn {AIi, Fli,Ci, cflcti, Inti) (A/2, i^/2, C2, c/Zcf 2, ^'^■^2) iff 

3c ^ /ewe/. c/Zci^ = T A isMinimalBackjumpLevel level I Ci Mi A 
A/2 = (prefixToLevel /ewe/ A/i)@/* A FI2 = Fli@[Ci] A 
C2 = [ ] A cflct2 = ± A lnt2 = T 

forget {Mi,Fli,Ci, cflcti, Inti) (A/2, /'/2, C2, cflct2, lnt2) iff 
3 Fc. cflcti = ± A /ntj = T 

FcC Fl A (V c e Fc. -.(3 /. isReason c / Mi)) A 

FI2 = Fli \Fc A A/2 = Ml A C2 = Cl A cflct2 = cflcti A lnt2 = ± 

restart {Mi,Fli,Ci, cflcti, (A/2, ^"^2, ^2, cflct2, lnt2) iff 
c/?cij = ± A Inti = T A 

A/2 = prefixToLevel A/i A /'/2 Fh A C2 = Ci A c/);ct2 = c/Zcti A /nt2 = ^ 

These rules will be used to formulate three different transition systems. The system 
-^r consists of all rules except restart, the system — >■ f consists of all rules except forget, and 
the system consists of all rules. 

8.2. Properties. The structure of the invariants and the proofs of the properties of the 
system are basically similar to those given in Section [TJ while the termination proof requires 
a number of new insights. 

8.2.1. Invariants. All invariants formulated so far hold, but the formula F, not present in 
the new state, has to be replaced by Fq @ Fl. 

8.2.2. Termination. Termination of the system without restarts is proved first. 

Theorem 8.3 (Termination for ^r-)- If the set DecVars is finite, for any formula Fq, the 
relation — t-^ is well-founded on the set of states s such that sq — >* s, where sq is the initial 
state for Fq . 

Proof. Let ;^ be a (parametrized) lexicographic product (Definition!?]), i.e., let 

y = ^trl Vars (*lex*) ^bool (*lexP*) (^Xs. ^ (*lex*) ^bool, 

where Ms is the trail in the state s. By Proposition 13.4! and Lemmas 14.161 17.121 and 17. 14} 
the relation >- is well-founded. If the state (Mi, Fli, Ci, cflcti, Inti) satisfies the invariants 
and if (Mi, Fli,Ci, cflcti, Inti) -^r {M2, FI2, C2, cflct2, lnt2), then (Mi, cflcti, Ci,^lnti) >- 
{M2, cflct2,C2,^lnt2). Indeed, by Lemma B.14I the rules unitPropagate, decide and back- 
jumpLearn decrease M, the rule conflict does not change M but (by Lemma lY. lip decreases 
cflct, the rule explain does not change M nor cflct, but (by Lemma l7.16p decreases C, and 
the rule forget does not change M, cflct, nor C, but decreases ^Int. 

From the above, the theorem holds by Proposition 13.41 (for a suitable f). □ 
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The termination proof of the system without forgets is more involved. We define a 
(not necessarily well-founded) ordering of the formulae by inclusion and its restriction with 
respect to the set of variables occurring in the formula. 

Definition 8.4 (^Pormc)- -^i ^Formc F2 iff Fi C F2. 

Definition 8.5 (^Formc I -^1 >~'Form<z\vbi F2 iff vars Fi C Vlh A vars Fi C Vhl A 
Fi Forme -^2, where F denotes the formula obtained by removing duplicate literals from 
clauses and removing duplicate clauses. 

Lemma 8.6. If the set Vhl is finite, then the relation ^Formclvii ^-s well-founded. 

The following lemma states that if unit propagation is done eagerly and if backjumping 
is always performed to the minimal backjump level, then the clauses that are learnt are 
always fresh, i.e., they do not belong to the current formula. 

Lemma 8.7. // sq an initial state, sq — )-j sa end backjumpLearn sa sb, where sa = 
[Ma, FIa, Ca, T, IntA), then Ca^Fo® FIa- 

Therefore, backjumpLearn increases formula in the inclusion ordering. 

Lemma 8.8. If sq — >j sa and backjumpLearn sa sb for initial state sq and states sa o-nd 
SB, then Fq@FIa yFormc\varsFo@ FIb, where Fa and Fb are formulae in states sa o-nd 

SB- 

Theorem 8.9 (Termination for — >/). // the set DecVars is finite, for any formula Fq, the 
relation -^f is well-founded on the set of states s such that sq — s, where sq is the initial 
state for Fq . 

Proof. Let ^ be a (parametrized) lexicographic product (Definition!?]), i.e., let 

y = ^Formcl Vars (*lex*) ^bool (*lex*) ^tr| Vars (*lex*) ^bool (*lexP*) (^As. ^^{f^^ , 

where Mg is the trail in the state s. By Proposition l3.4l and Lemmas l4.16l [7rT2l 17. 141 and l8.6l 
the relation >- is well-founded. If the state (Mi, Fli, Ci, cflct^, Inti) satisfies the invariants 
andii{Mi,Fli,Ci,cflct^,lnti) {Mi, Fh, Ci, cflct-i^, Inti), then {Fi,^lnti, Mi, cficti,Ci) >~ 
{F2,^lnt2,M2, cfict2,C2). Indeed, by Lemma [52] the rule backjumpLearn decreases F, the 
rule restart does not change F but decreases ^Int, the rules unitPropagate and decide do not 
change F and Int but (by Lemma I4.14p decrease M, the rule conflict does not change F, 
Int, nor M but (by Lemma rz.llj) decreases cflct, and the rule explain does not change F, 
Int, M nor cflct, but (by Lemma l7.16p decreases C. 

From the above, the theorem holds by Proposition 13.41 (for a suitable f). □ 

If both forget and restart are allowed, then the system is not terminating. 

Theorem 8.10. The relation — t- is not well-founded on the set of states reachable from the 
initial state. 

Proof. Consider the formula [[-1,-2,3], [-1,-2,4], [-1,-3,-4], [-5,-6,7], [-5,-6,8], 
[-5,-7,-8]]. The following derivation chain (for simplicity, not all components of the 
states are shown) proves that the relation is cyclic. 
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Therefore, it holds that 



([],[],[],^,±) ^* ([],[[-!, -2]],[],±,i) ^+ ([],[[-!, -2]],[],±,±). □ 

However, if there are additional restrictions on the rule application policy, the system 
may be terminating. Since the number of different states for the input formula Fq is finite 
(when duplicate clauses and literals are removed), there is a number Uf (dependent on 
Fq) such that there is no chain of rule applications without forget longer than Uf {-^j is 
well-founded and therefore acyclic, so, on a finite set, there must exist Uf such that -^^^ is 
empty). Similarly, there is a number rir (dependent on Fq) such that there is no chain of 
rule applications without restart longer than rir- So, termination is ensured for any policy 
that guarantees that there is a point where the application of forget will be forbidden for at 
least Hf steps or that there is a point where the application of restart will be forbidden for 
at least rir steps. 

8.2.3. Soundness, Completeness and Correctness. Soundness and completeness proofs from 
previous sections hold with minor modifications necessary to adapt them to the new defi- 
nition of state and rules. The most demanding part is to update Lemma 14.61 and to prove 
that the new rules maintain the invariants. 

9. Related Work and Discussions 

The original DPLL procedure |DLL62| has been described in many logic textbooks, along 
with informal proofs of its correctness (e.g., |DSW94] ). First steps towards verification of 
modern DPLL-based SAT solvers have been made only recently. Zhang and Malik have 
informally proved correctness of a modern SAT solver |ZM03| . Their proof is very informal, 
the specification of the solver is given in pseudo-code and it describes only one strategy for 
applying rules. The authors of two abstract transition systems for SAT also give correctness 
proofs |NOT06( IKG07] . These specifications and the proofs are much more formal than 
those given in [ZM03] . but they are also not machine- verifiable and are much less rigorous 
than the proofs presented in this paper. 



34 



F. MARIC AND P. JANICIC 



In recent years, several machine- verifiable correctness proofs for SAT solvers were con- 
structed. Lescuyer and Conchon formalized, within Coq, a SAT solver based on the classical 
DPLL procedure and its correctness proof |LS08| . They used a deep embedding, so this 
approach enables execution of the SAT solver in Coq and, further, a reflexive tactic. Marie 
and Janicic formalized a correctness proof for the classical DPLL procedure by shallow 
embedding into Isabelle/HOL [MJlOj . Shankar and Vaucher formally and mechanically 
verified a high-level description of a modern DPLL-based SAT solver within the system 
PVS [SV09j . However, unlike this paper which formalizes abstract descriptions for SAT, 
they formalize a very specific SAT solver implementation within PVS. Marie proved par- 
tial correctness (termination was not discussed) of an imperative pseudo-code of a modern 
SAT solver using Hoare logic approach jMar09| and total correctness of a SAT solver im- 
plemented in Isabelle/HOL using shallow embedding |MarlOj . Both these formalizations 
use features of the transition systems described in this paper and provide links between the 
transition systems and executable implementations of modern SAT solvers. In the former 
approach, the verified specification can be rewritten to an executable code in an imperative 
programming languagqlj while in the latter approach, an executable code in a functional 
language can be exported from the specification by automatic means [HN10| . 

The transition system discussed in Section S] corresponds to a non-recursive version 
of the classical DPLL procedure. The transition systems and correctness proofs presented 
in the later sections are closely related to the systems of Nieuwenhuis et al. |NQT06j and 
Krstic and Goel |KG07] . However, there are some significant differences, both in the level 
of precision in the proofs and in the definitions of the rules. 

Informal (non machine- verifiable) proofs allow authors some degree of imprecision. For 
example, in |NOT06] and |KG07j clauses are defined as "disjunctions of literals" and for- 
mulae as "conjunctions of clauses", and this leaves unclear some issues such as whether 
duplicates are allowed. The ordering of clauses and literals is considered to be irrelevant — 
in |KG07j it is said that "clauses containing the same literals in different order are consid- 
ered equal", and in [NQT06] it is not explicitly said, but only implied (e.g., clauses in the 
unitPropagate rule are written as C V where M l=-i C and I is undefined in M, and from 
this it is clear that the order of literals must be irrelevant, or otherwise only last literals in 
clauses could be propagated). Therefore, clauses and formulae are basically defined as sets 
or multisets of literals. In our formal definition, clauses and formulae are defined as lists. 
Although a choice whether to use lists, multisets, or sets in these basic definitions might not 
seem so important, fully formal proofs show that this choice makes a very big difference. 
Namely, using sets saves much effort in the proof. For example, if formulae may contain 
repeated clauses, easy termination arguments like "there are finitely many different clauses 
that can be learnt" cannot be applied. On the other hand, using sets makes the systems 
quite different from real SAT solver implementations — eliminating duplicates from clauses 
during solving is possible and cheap, but explicitly maintaining absence of duplicate clauses 
from formulae may be intolerably expensive. It can be proved that maintaining absence of 
duplicate clauses can be, under some conditions on the rules, implicitly guaranteed only by 
eliminating duplicate clauses from formulae during initialization. Solvers typically assume 
this complex fact, but it was not proved before for formulae represented by lists, while for 
systems using sets this issue is irrelevant. 



As done in the implementation of our SAT solver ArgoSAT. 
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The system given in |NOT06j is very close to the system given in Section [5] and later 
extended in Section [6l The requirement that the set of decision literals exactly coincides 
with the set of literals from the input formula is too strong and is not always present in 
real SAT solvers, so it is relaxed in our system and the set DecVars is introduced (a similar 
technique is used in |KG07j ). Also, the definition of the backjump rule from |NOT06] 
requires that there is a false clause in the formula being solved when the rule is applied, 
but our formal analysis of the proofs shows that this assumption is not required, so it is 
omitted from Definition 15.11 As already mentioned, the condition that the unit clauses 
belong to the formula is also relaxed, and propagating can be performed over arbitrary 
consequences of the formula. The invariants used in the proofs and the soundness proof 
are basically the same in |NOT06j and in this paper, but the amount of details had to 
be significantly increased to reach a machine-verifiable proof. Our completeness proof is 
somewhat simpler. The ordering used in termination proof for the system with backjumping 
in |NOT06) expresses a similar idea to ours, but is much more complex. A conflict analysis 
process is not described within the system from |NQT06) . 

The system given in |KG07] is close to the system given in Section [3 with some minor 
differences. Namely, in our system, instead of a set of decision literals, the set of decision 
variables is considered. Also, unit, conflict and reason clauses need not be present in 
the formula. The conflict set used in [KG07] along with its distinguished value no_cflct 
is here replaced by the conflict flag and a conflict clause (the conflict set is the set of 
opposites of literals occurring in our conflict clauses). The underlying reasoning used in two 
total correctness proofs is the same, although in [KG07j the invariants are not explicitly 
formulated and the proof is monolithic (lemmas are not present) and rather informal. 

Formalization of termination proofs from both |NOT06j and [KG07| required the great- 
est effort in the formalization. Although arguments like "between any two applications of 
the rule . . . there must be an occurrence of the rule . . . " , heavily used in informal termina- 
tion proofs, could be formalized, we felt that constructing explicit termination orderings is 
much cleaner. 

In [KG07j termination of systems with restarts is not thoroughly discussed and in 
|NOT06) it is proved very informally, under a strong condition that the periodicity of restarts 
is strictly increasing. This is often not the case in many modern SAT solver implementations. 
In this paper, we have (formally) proved that restarting can be performed very frequently 
(after each conflict) without compromising total correctness. However, some additional 
requirements (unit propagation must be exhaustive, backjumping must be performed to 
minimal backjumping levels, and backjump lemmas must always be learnt) are used in the 
proof, but these are always present in modern SAT solvers. Although the issue has been 
addressed in the literature, we are not aware of a previous proof of termination of frequent 
restarting. 

10. Conclusions 

We presented a formalization of modern SAT solvers and their properties in the form of 
abstract state transition systems. Several different SAT solvers are formalized — from the 
classical DPLL procedure to its modern successors. The systems are defined in a very 
abstract way so they cover a wide range of SAT solving procedures. The formalization 
is made within the Isabelle/HOL system and the total correctness properties (soundness, 
termination, completeness) are shown for each presented system. 
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Central theorems claim (roughly) that a transition system, i.e., a SAT solver, terminates 
and returns an answer yes if and only if the input formula is satisfiable. This whole con- 
struction boils down to the simple definition of satisfiable formula, which can be confirmed 
by manual inspection. 

Our formalization builds up on the previous work on state transition systems for SAT 
and also on correctness arguments for other SAT systems. However, our formalization is 
the first that gives machine-verifiable total correctness proofs for systems that are close to 
modern SAT solvers. Also, compared to other abstract descriptions, our systems are more 
general (so can cover a wider range of possible solvers) and require weaker assumptions that 
ensure the correctness properties. Thanks to the framework of formalized mathematics, 
we explicitly separated notions of soundness and completeness, and defined all notions and 
properties relevant for SAT solving, often neglected to some extent in informal presentations. 

Our experience in the SAT verification project shows that having imperative software 
modelled abstractly, in the form of abstract state transition systems, makes the verification 
cleaner and more flexible. It can be used as a key building block in proving correctness of 
SAT solvers by using other verification approaches which significantly simplifies the overall 
verification effort. 
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